Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing Matrix Multiplication on Heterogeneous Reconfigurable Systems

c © 2007 by John von Neumann Institute for Computing Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise requires prior specific permission by the publisher ment...

متن کامل

Efficient Bit-Serial Constant Multiplication for FPGAs

This paper describes how to realize place-effective synchronous bit-serial constant multiplications, which can be efficiently used for a block of constants (Multiple Constant Multiplication). The architecture combines traditional concepts and new approaches, which leads to the possibility of simultanous fast multiplications of different input values. Fast multiplications are a core for up-to-da...

متن کامل

A Bit-Serial Cell for Reconfigurable Hardware

This paper introduces a novel bit-serial cell for reconfigurable hardware used to perform digital signal processing. The cell contains an array of 4-bit lookup tables, or “elements”, that can operate in two modes. In memory mode, the elements behave as a random-access memory. In mathematics mode, the elements perform operations such as multiply-accumulate, addition, and shifting in bit-serial f...

متن کامل

Bit-serial architecture for optical computing.

The design of a complete, stored-program digital optical computer is described. A fully functional, proof-of-principle prototype can be achieved by using LiNbO(3) directional couplers as logic elements and fiber-optic delay lines as memory elements. The key design issues are computation in a realm where propagation delays are much greater than logic delays and implementation of circuits without...

متن کامل

Optimizing Matrix-matrix Multiplication for an Embedded Vliw Processor

The optimization of matrix-matrix multiplication (MMM) performance has been well studied on conventional general-purpose processors like the Intel Pentium 4. Fast algorithms, such as those in the Goto and ATLAS BLAS libraries, exploit common microarchitectural features including superscalar execution and the cache and TLB hierarchy to achieve near-peak performance. However, the microarchitectur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Reconfigurable Technology and Systems

سال: 2019

ISSN: 1936-7406,1936-7414

DOI: 10.1145/3337929